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Abstract 

Reduced models for the (defocusing) nonlinear Schrodinger equation are developed. In particular, 
we develop reduced models that only involve the low-frequency modes given noisy observations of 
these modes. The ansatz of the reduced parametric models are obtained by employing a rational 
approximation and a colored noise approximation, respectively, on the memory terms and the 
random noise of a generalized Langevin equation that is derived from the standard Mori-Zwanzig 
formalism. The parameters in the resulting reduced models are inferred from noisy observations 
with a recently developed ensemble Kalman filter-based parameterization method. The forecasting 
skill across different temperature regimes are verified by comparing the moments up to order four, 
a two-time correlation function statistics, and marginal densities of the coarse-grained variables. 
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I. INTRODUCTION 


An important scientific problem in applied sciences is to forecast some quantity of interest 
of dynamical ^sterns that exhibit multiscale behavior. Traditional approaches (see e.g., 
review paper [l|) often assume some knowledge about the underlying dynamics and proceed 
by deriving an effective equation for a set of preselected variables. Instead of working with 
the trajectories associated with the full solutions, one is interested in a reduced model in 
which only the quantities of interest are involved. These quantities of interest are generally 
referred to as the coarse-grained variables. In the case when the dynamics of the coarse¬ 
grained variables is of primary interest, the effective model provides an efficient means to 
simulate directly the coarse-grained variables, without having to keep track of the remaining 
degrees of freedom. 

An elegant framework for deriving the effective model is the Mori-Zwanzig projection 
a. which ha= recently hecome nn extremely importnnt tool to enrrphfy complex dynamtcal 
systems |5Nl5|. In particular, this derivation led to a set of generalized Langevin equation 
(GLEs), a typical result of the Mori-Zwanzig procedure. A notable feature of the GLE is a 
memory term which represents the history-dependence of the effective dynamics, along with 
a random noise term, which incorporates the influence of the remaining degrees of freedom. 
Unfortunately, solving the resulting GLE still remains as a challenge. For example, the 
memory function has been expressed as an inhnite series [^, and it may exhibit very 
slow decay. The implication is that a long history of the solution has to be kept in order to 
evaluate the integral in the GLE. The evaluation of the integral has to be done at every time 
step, which adds great complexity to the entire computation. Furthermore, incorporating 
the random noise term is not straightforward. 


A simple approach proposed by 


7, 3, 181 is to approximate the memory kernel in the GLE 
model with a delta function (but with a carefully chosen damping parameter). This certainly 
introduces additional modeling error that is difficult to quantify. In problems that arise from 
biological systems, the memory function in the GLE can be computed by matching the auto¬ 
correlation function of the coarse-grained variables. For instance, for the GLEs derived from 
Newton’s equations of motion in classical mechanics, one can derive an integral equation 


for the memory function 


[lO, 19, 


20l |. But this approach requires the computation of the 


velocity correlation function for the full model, which clearly is a challenge. Furthermore, 
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the solution procedure for the integral equation is often not reliable. Another approach 
to approximate the GLE is by using an extended Markoviai i sy stem, which can be done 
using a projection to the Krylov subspace approximation jl4, 21|. This approach, however, 
requires the knowledge of the full model, especially the interaction among all the degrees of 
freedom. But this approach suggests that the full GLE models with strong memory effects 
can be approximated by an extended system with a few auxiliary variables and this key 
result motivates the present work. 

The main idea of the present work is to apply a rational approximation to the kernel 
function in the GLE and a colored noise approximation to the orthogonal dynamics in the 
GLE such that the resulting parametric model is Markovian. We subsequently use the 


stability conditions established in 


22| as guidelines to ensure non blow-up solutions in the 


resulting models. Rather than deriving the explicit dependence of the parameters in the 
resulting Markovian models in terms of the true solutions (and/or the parameters in the 
original dynamics), we estimate these parameters by solving an inverse problem, hltering 
partially observed noisy measurement of the dynamics. This approach is often useful when 
(a) there is a large amount of training data, e.g., from experimental observation of part 
of the system; (b) the explicit form of the GLEs is difficult to obtain; (c) we don’t have 
access to the exact solutions of the full dynamical systems. Gomputationally, since the 
resulting model is Markovian, we don’t need to explicitly compute the memory terms and 
we don’t need to store the solution history. More importantly, compared to direct numerical 
approximation of the integro-differential equations associated with the GLE model, solving 
the reduced parametric system requires much less computational cost. 

We will demonstrate our modeling approach on the nonlinear Schrodinger equation (NLS), 
which Ends many applications in various areas of applied physics. Of our particular interest 


is the statistical-mechanics aspects, which has been well studied theoretically 


23|-l25j. Our 


goal is to predict the equilibrium statistical behavior of the low-order wave numbers. We 
should stress out that developing reduced models for the NLS equations is highly nontrivial 
in the following sense. Since the solutions of NLS equation exhibit strong correlation time 
with nontrivial autocorrelation function, the memory feedback from the unresolved scales is 
non-negligible and need to be appropriately accounted. Moreover, the equilibrium statistics 
of the solutions are highly non-Gaussian with bimodal distribution. We will proceed by 
applying a rational approximation and a colored noise approximation, subsequently, to the 
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kernel functions and orthogonal dynamics of a GLE, derived by Chorin and coworkers 16 |. 
Subsequently, we estimate the parameters of the resulting model with an adaptive parameter 
estimation scheme that is recently developed in 2^. We will then validate the forecasting 
skill by comparing moments up to order four, a two-time correlation function statistics, and 
marginal densities of the coarse-grained variables. 

The remaining part of the paper is organized as follows. In Section [TTl we state the 
problem and provide a short review of the GLE deduced by Ghorin and coworkers 161. 
In Section IIIIl we construct the parametric models. The procedure for the parametric 

26| is formally described in Section [IVl To be self-contained, we 


estimation method in 


include a pseudo-algorithm in the appendix. In Sections IVTlVIl numerical results are then 
presented to demonstrate the effectiveness of the reduced models. We close this paper with 
a short summary and discussion in Section IVIII 


II. PROBLEM STATEMENT AND BACKGROUND 

We consider the nonlinear Schrodinger equation (NLS), 

iut = -u^x + \u\‘^u, ( 2 . 1 ) 

in one space dimension. For simplicity, we apply a periodic boundary conditions on a non- 
dimensionalized domain xG[0,27r]. Here, the solutions of (12.1 p can be described by the 
Fourier series, 

u{x,t) = '^Uk{t)e^^^. ( 2 . 2 ) 

fcez 

This turns the PDE into a set of ODEs for the Fourier modes, 

EE —fc! (2-3) 

fcieZA; 26 Z 

with dispersion relation given by, ujk = k‘^- 

Of particular importance to the statistical mechanics interpretation of (12.ip is the Hamil¬ 
tonian structure of the system, with the Hamiltonian given by, 

E = Eq + El, 
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where, 


kez 

'Ukx'Uk2'Uks,'^k]_+k2-k3- 

^ fcieZfe 26 Zfc 3 ez 

With this Hamiltonian, we can rewrite fl2.3p as follows. 


(2.4) 


. d dE 


Numerically, we can simulate the solutions of fl2.4p with pseudo-spectral methods, e.g. [27|, 
of fl2.4p for hnite wave numbers, \k\<K. The initial condition can be prepared using a 
Monte-Carlo algorithm. We assume that the resulting solutions are the underlying dynamics. 

In this paper, we are interested to construct a low-dimensional parametric model to 
predict low-frequency modes of fl2.4p . given noisy observations of the corresponding modes 
at discrete-times. Namely, 


Vk,j = Uk{tj)+e°, \k\<m, 


(2.5) 


where m denotes the upper bound of the observed/resolved modes that are much smaller 
than the dimensionality of the underlying dynamics K, m<^K. In fl2.5p . the noises e° are 
i.i.d. Gaussian with mean zero and unknown error covariance, R. To achieve this goal, 
our strategy is to exhaust our physical knowledge of the model to deduce an appropriate 
ansatz for the parametric model and then apply a recently developed, adaptive ensemble 
Kalman hlter based, parameter estimation method 2^ to specify the parameters in the 
corresponding model as well as the observation noise covariance, R. 

Before we discuss our main strategy, we briefly review a classical dimensional-reduction 
Mori-Zwanzig formalism { 2 -^, which underpins the choice of ansatz for our parametric 
models in the remaining of this section. 


A. Reduced Models from the Mori-Zwanzig formalism 


A general framework for reducing the dimension associated with a complex dynamical 
system is the Mori-Zwanzig projection formalism j^M], which was originally developed to 
deal with non-equilibrium processes in statistical mechanics. This approach relies on a pro¬ 
jection operator, denoted by V, which separates out the quantities of interest and identihes 
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terms of different nature. In particular, for a system of initial value problem in the form, 


x = f(x), x{0) = z, 


( 2 . 6 ) 


and an arbitrary reduced quantity,^, which is a function of x{t), the Mori-Zwanzig procedure 
yields an exact equation for 99 I 2 , 2], 


^(p{t)=e“'PC(p{0)+ I e^‘ ‘^‘K{s)ds + l;{t), 


(2.7) 


where the first term in fl2.7p usually represents the reversible part of the dynamics and 
it represents the “Markovian” term. Here, the differential operator C corresponds to the 
generator of the dynamical system in fl2.6p and it is defined with respect to initial condition 
z as follows. 


^=Ef- 


A 

' dzi ’ 


( 2 . 8 ) 


and we use semigroup notation to denote the evolution operator that maps the solutions 
forward in time as follows, (p{t) =e^^(p{0). The second term depends on ip at all times 
between 0 and t so it incorporates the memory effect as a result of coarse-graining, and it 
dictates a strong coupling with the remaining degrees of freedom through a memory kernel. 


K{t)=vcm. 


(2.9) 


where 

^(t) = e*2^Q£(p(0), Q = I-V. (2.10) 

The term in fl2.10p is referred to as the orthogonal dynamics and if the initial condition 
2 ; is random, then ^{t) is a stochastic forcing. Equation fl2.7l) is often called a generalized 
Langevin eguation (GLE). The most appealing aspect of the GLE in 02.71) is that it is exact. 
However, solving the GLE in 02.7p directly is not much simpler than solving the full system 
in 02 .op since one has to estimate the orthogonal dynamics in 02 .10p and the memory kernel 
function in 02 .Op . 

We should point out that the GLE for the ODE in 02.61) is nonunique since there are 
different choices for the projection operator V. For example, in the work of Mori j^, an 
orthogonal projection is employed, which is often appropriate when the problem can be 
formulated in a Hilbert space. For the NLS equation in 02 .4p . Ghorin et al Q used a 
projection operator that is the conditional expectation with respect to the canonical en¬ 
semble p (X to deduce an effective equation for selected (low-frequency) Fourier modes. 
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\k\ <m. This choice is motivated by the statistical mechanics aspect of the NTS 2^. Since 
the calculation is usually quite cumbersome, an expansion around the Gaussian distribution 
Po(xe~^^° was introduced, which is appropriate for systems at low temperature, To 

see this, one can introduce a change of variables in the Gibbs distribution, v = y/Pu. As 
a result, the distribution can be written as pecAt low temperature when 
/5 3>1, the distribution is approximately Gaussian. Furthermore, higher order terms in the 
equation are much less important, since statistically, u is of the order 

In the simplest case when m = 0, only the zeroth Fourier mode is retained and the effective 
equation takes the form of H. 

uo = -icuo-i\uo\^uo+ / Ko{t-T)uQ{T)dT+ i / 0o(i-T)|Mo(r)pMo(r)dr + ^(t), (2.11) 


where c is a positive constant, and kq and (f)Q are complex valued kernel functions with 
complicated expressions 1^ (they are written as infinite series). Furthermore, in solving 
fl2.1ip . the history of the solution has to be stored and the integral has to be approximated 
by appropriate quadrature formulas at each step of the time integration. All these operations 
add up to signiheant computational cost and it is also unclear how the approximation by po 
affects the modeling error. 

Rather than computing these kernels directly, we take a different approach here. In 
particular, we will model the memory terms in fl2.1ip and the stochastic process ^ with an 
appropriate ansatz of parametric equations. Subsequently, we estimate the corresponding 
parameters from noisy observations (12. 5 p such that the resulting Markovian model gives 
accurate equilibrium statistical estimates for the selected Fourier modes that are retained: 
|A;| < m. 


III. CONSTRUCTING PARAMETRIC MODELS 

Here we discuss our approach in approximating the GLEs using parametric models that 
involve explicitly few parameters. These approximations are constructed in such a way that 
the approximate model can be re-written into a memory-less form, leading to a Markovian 
dynamics to facilitate the numerical implementation. To clarify the exposition, we first 
discuss the case where we only retain the zeroth mode. We assume the form of the GLE 
fl2.1ip . but we approximate the memory terms using rational functions so that parameters 
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can be introduced. We then provide the resulting parametric form for the more general case 
which retain more Fourier modes, 0 < |fc| <m. 


A. A reduced model for uq with scalar parametric approximation 


Now we will construct an ansatz for approximating the hrst memory term in fl2.11|) . To 
this end, we introduce a parameter 6 G C and an auxiliary function / to denote the hrst 
memory term, 

bf-= f Ko{t-T)Uo{T)dT, (3.12) 

Jo 

and our plan is to hnd a set of differential equations for solving /. First, taking the Laplace 
transform on (13.1211 . we arrive at, 

bf{s) = ko{s)uo{s), (3.13) 


where we denote h to be the Laplace transform (dehned on frequency domain s) of any 
function h that is locally integrable on M’*'. The key idea is to approximate the kernel 
function, kq, using a rational function. 


Ko(s) 


-\b? 


(3.14) 


where a G C is the second parameter to be determined. This particular form of the rational 
function is chosen to ensure the stability of the resulting parametric model, as we will 
explain below. In principle, these two coefficients, a and b, can be determined with Fade 


approximations (or more general rational approximations) of the exact kerne 


reduction problems, this is known as the moment matching procedure 


28|, 


In model 


29l |. where for 


linear dynamical systems, these parameters can be explicitly connected to properties of the 
original problem. The main departure of the current approach from those existing methods 
is mainly that we leave them as parameters and later infer them with a hltering procedure, 
learning from partially observed noisy time series. 

Converting fl3.13p and fl3.14p back to the time domain, we hnd that / satishes a differential 
equation, 

f = af-b*Uo{t), /(0) = 0. (3.15) 


For low temperature case, one can neglect the second memory term in (12.111) that involves 
'ipo since this higher-order term is negligible as explained before in Section III A1 With this 
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perspective, we propose the following parametric model, 


Uo = — icuo — id\uo\‘^Uo + bf, 
f =af-b*Uo + aiWf, 


(3.16) 


where we have introduced two additional non-negative parameters c and d. In principle, 
these parameters can be determined from the Mori-Zwanzig reduction procedure, which 
might involve lengthy calculations. However, since the derivation in 16| employed further 
approximations using the (conditional) Gaussian distribution, the resulting values for c and 
d may not be optimal. Therefore, we kept the form of the equations suggested by the Mori- 
Zwanzig formalism, but leave c and d as additional parameters, which we will determine 
using a hltering procedure. 

We have also introduced a white noise Wf. When the second equation is analytically 
solved and subsequently substituted into the hrst equation, this white noise will become a 
colored-noise approximation to the random process ^{t). When both a and b are real-valued 


parameters, the second equation represents an Ornstein-Uhlenbeck process 


301. But here / 


is a more general Gaussian process. We should also point out that the reduced system of 
parametric equations in (I3.16j) is a special case of the physics constrained nonlinear regression 


model described in 


22l | in the following sense. In compact form, we can write 03.161) as a 


system of four-dimensional real valued SDEs, 


dx = [Ax -f iV(x)] dt + T,dW, 


(3.17) 


where we dehne x = (Re{Mo},Im{Mo},Re{/},Im{/})^, and W is a standard two-dimensional 
Wiener process. In addition, we dehne a = ai+ia 2 and b = bi+ib 2 such that. 


A = 


/ 0 c bi —b2\ 
-c 0 62 bi 
—bi —62 Q-i —0,2 

\ 62 -bi 02 Oi J 

One of the main results in 


N{x) = d 


^ Iwoplmtio ^ 
-l^opReMo 
0 
0 


E = 


/ 


/ 0 0 \ 
0 0 
^ 0 
Vo 


(3.18) 


22| states that if the Fokker-Planck operator of the SDE in 


03.17P is hypoelliptic, and suppose also that all eigenvalues of A have negative real part and 
there exists an appropriate norm under which the inner product {N{x),x) = 0, then solutions 
of fl3.17p is geometrically ergodic. For our parametric model above, the stability condition 
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is met when ai < 0 and the dissipation of the energy of the nonlinear terms is satisfied under 
an inner product with respect to L = E + ^\f\‘^. 

We should note that one can repeat the same calculation for approximating the second 
memory term in (I2.1ip but the resulting nonlinear terms will not conserve energy and can 
be unstable (see Appendix A). Based on this consideration, we ignore approximating the 
second memory terms in this paper. Instead, we will only consider the parametric model in 
fl3.16p which guarantees non-blowup solutions. 


B. A reduced model for uq with multi-dimensional parametric approximations 


A simple extension of the two-parameter scalar parametric approximation model in fl3.16p 
is to allow b and / to be vectors, here denoted by b and /, respectively, to emphasize the 
multi-dimensional representation. This leads to an extended model. 


Uo = -ic-Uo-id\uo\‘^Uo + b-f, 

f=Af-b*uo + J:W. 


(3.19) 


For instance, the matrices A and S can be chosen in the following form. 


A = 

Ui a2 


(Ji 0 

) ^ — 


— Qj2 


0 (72 


(3.20) 


The corresponding extended model has a parameter space of dimension 10. We will see 
that this model will give improved estimates compared to fl3.16p in higher temperature case. 
Similar extensions can be found by increasing the dimension of A, b and /. 


C. Models with more retained Fourier modes 


In general, we can keep those modes k with \k\ <m, and m indicates the range of the 
modes to be kept. In this case, we first define the coarse-grained energy, 

1 










(3.21) 


|fc|<m |A:i|<m \k2\<m\k3\,\ki+k2—k3\<m 

The model without memory can be written as follows, 

.dE 


Uk — 1 


dul 


—m <k<m. 


(3.22) 
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B. 


Motivated by fl3.16p and the Mori-Zwanzig procedure let us consider a parametric model 
as follows, 


dE 


Xlj^ — ‘ 2 ' 


dul 

dE 


-bkfk, 


(3.23) 


fk = -bl-^ + akfk + (JkWk, -m<k<m. 


The auxiliary functions are assumed to be zero initially, i.e., /^(O) = 0, since they are intro¬ 
duced to approximate the memory terms. 

To see the energy dissipation mechanism, we can dehne the Lyapunov functional. 


V^E + Y^\f, 


(3.24) 


Direct calculations yield. 


d 


^r = 5^Re(aOIAr. 


(3.25) 


Consequently, we require that Re(afc) <0 to guarantee non-blow up solutions. 


IV. THE PARAMETER ESTIMATION PROCEDURE 

In this section, we describe formally how to estimate the parameters of the reduced models 
(e.g., fl3.16p . fl3.19p . or f|3.23p ). given noisy observations Vj = {v-m,jT--,Vm,j)~^ of 

Uj = {u_rn,j, ■ ■ ■ ,Um,j)~^ , where Ukj = Uk(tj) are solutions of the full system in fl2.3p 

for \k\<K and at discrete time step tf 

Vj=Uj + ej, ej~AA(0,i?), \k\<m, (4.26) 

with an unknown observation error covariance R, where R is an (2m-|-l) x (2m-|-l) diago¬ 
nal matrix with /c-th diagonal component, r^. To simplify the discussion, let us classify the 
parameters in our reduced model to two types. We refer to the parameters in the determin¬ 
istic terms in the reduced models as the “deterministic parameters”, 9^, and the amplitude 
of the stochastic forcings as the “stochastic parameters”, 6s- For example, in fl3.16p . the 
deterministic and stochastic parameters are, 9^ = {a,b,c,d}, and 9s = {<jI,R}, respectively. 
We split the parameters into two types because the algorithm that we use to estimate 9^ is 
simply a standard augmentation method while the algorithm to estimate 9s is the adaptive 
method for estimating covariances which preserves the positivity of af and R. Let us also 
dehne vector = (M_mj,-■ • ■ • j/mj)"'^ £to simplify the notation below. 
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The main idea of the parameterization method is to apply Bayes’ theorem to obtain a 
posterior distribution of the augmented state and parameters at each time step tj when 
observations become available, 


p{xj,9d,0s\vj) (xp{xj,9d,0s)p{vj\xj,9d,9s), (4.27) 

where p{xj,9d,9s) denotes the prior distribution of the augmented state and parameters at 
time tj and p{vj\xj,9d,9s) denotes the likelihood function of the augmented state and param¬ 
eters, corresponding to the observation model in 04.261) . that is, p{vj\xj,9d,9s) =N'{uj,R). 
The parameterization method can be formally described as follows: Since p{xj,9d,9s) = 
p{9s)p{xj,9d\9s) by dehnition of the conditional distribution, we can rewrite 04.271) as fol¬ 
lows: 


p{xj,9d,9s\vj)(xp{9s)p{xj,9d\9s)p{vj\xj,9d,9s), (4.28) 

(xp{9s)p{xj,9d\vj,9s), (4.29) 

where we use another Bayes’ theorem, p{xj,9d\vj,9s) (xp{xj,9d\9s)p{vj\xj,9d,9s), to obtain 
04.29p . Here, the first step in the filtering algorithm is to estimate p{xj,9d\9s,yj) by applying 
Bayes’ theorem to the last two components of 04.28p . Subsequently, we implement the 
Bayes’ theorem one more time in 04.291) to obtain the posterior distribution of the augmented 
(^Xj , 9d, 9g^. 

To avoid unobservability of the stochastic parameters due to sparse observations with 
dimension less than the number of stochastic parameters, 9s, in our implementation, we 
include information from past observations up to lag L > 1. At each time step tj, instead of 
solving (I4.28p - (l4.29p . we formally solve 

p{xj,9d,9s\vj,...,Vj-L+i) (xp{xj,9d,9s\vj-i,...,Vj-L+i)p{vj\xj,9d,9s) 

(xp{9s)p{xj,9d\9s,Vj_i,...,Vj_L+i)p{vj\xj,9d,9s) (4.30) 

(xp{9s)p{xj,9d\9s,Vj,...,Vj-L+i), (4.31) 

where, similar as before, the first step is to estimate p{xj,9d\9s,Vj,... ,Vj-L+i) by applying 
Bayes’ theorem to the last two components of (I4.3np . Subsequently, we implement the 
Bayes’ theorem one more time in fl4.31l) to obtain the posterior distribution of the augmented 
{,Xj, 9d, 9 s')- 
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In our numerical implementation, we use the method in 2^ which uses Gaussian ap¬ 
proximation to solve these inverse problems. At time j > L, we assume that we have prior 
ensemble estimates of {xj,9d,9s} at times j,j — 1,... ,j — L + 1 and the associated observa¬ 
tions at these times. We assume that the deterministic parameter is persistence, that is, 
9d = 0. The hrst step is to apply ETKF method to obtain posterior ensemble estimates 
of the augmented variable {xj,9d}, incorporating observations in fl4.26p . which is a Gaussian 
approximation of p{xj,9d\9s,Vj,... ,yj_L+i)- To start the algorithm, one can just repeat this 
ETKF algorithm L-times to obtain the prior ensemble estimates at time j = l,...,L with 
hxed parameters {9s,9^,}. Now, at j > L, we start the secondary hlter to update 9s. The key 
idea of the secondary hlter is to view the posterior density function p{xj,9d\9s,Vj,... ,Vj_L+i) 
as a likelihood function of 9s. Notice that while this posterior density is Gaussian with 
respect to variables (xi,9d), its dependence on 9s can be described non-uniquely (for ex¬ 


ample, see 


26 


32 


36|). Here, we will adopt the estimation method of 26(] that is based 


on Belanger’s formulation 3^ with a likelihood function corresponding to the following 
pseudo-observation model. 


(^j,e = J^j,£ds + Vj,£, Pj,er^Af{0,Wj^i), i = j,...,j-L + 1 


(4.32) 


Here, components of = {ejej_^} are the product of the forecast error estimates in the 
observation space (which are also known as innovations). 




(4.33) 


where uj denotes the mean prior estimate that is empirically estimated with an ensemble 
average. In (I4.32p . the observation operator and the noise covariance matrix Wj^c are 
functions of xj_f^ and 9d and they will be constructed recursively. We should also note that 
in our implementation, Wj^i is approximated under a Gaussian assumption (see Appendix B 
below for detail). With the pseudo-observation model in fl4.32p . a secondary Kalman hlter is 
implemented L-times to sequentially update the posterior mean and covariance estimate of 
9s, accounting for pseudo-observations {<Jj/}e=i...,L one at a time. To be self-contained, we 
provide a pseudo algorithm of this method in Appendix B below. We should note that there 
are other methods to approximate the secondary Bayes’ update in (14.31 ) that use different 
observation model in fl4.32p and do not use Kalman update (see e.g., |35l. l36|). 
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V. NUMERICAL RESULTS ON PARAMETER ESTIMATION OF MODELS 
WITH A SINGLE FOURIER MODE uq 


In this section, we present the results from three numerical tests, where the parametric 
models for uq (equations fl3.16p and fl3.19p ) are estimated and further assessed. We as¬ 
sume that the observation time interval To6s = 0.02. The time series, consisting of 100,000 
observations, is generated by using the Strang’s splitting method in time, which has been 


implemented in 


27| for the NLS equation. The data is generated from the solutions of the 


NLS in the Fourier domain fl2.3p . with K = 32. 

We then perform a parameter estimation method with an ensemble Kalman filter based 
method 2^, which was described in the previous section. The forecast is generated with 


the 4th order Runge-Kutta method with step size At = 0.002. 

Following the estimation, we verify the forecasting skill of the reduced models with the 
estimated parameter set as follows. We take the estimated parameters and run the reduced 
models forward in time for a sufficiently long time. Then, based on the long trajectory, we 
compare the low-order statistics up to order-four and the time correlation function. The 
auto-correlation is computed based on the Wiener-Khinchin theorem. Namely, we take 
the Fourier transform of the data, x{t), and compute the power spectrum, |a;(a;)p. The 
correlation function is then given by the inverse Fourier transform of the power spectrum. 
This procedure is often more efficient than the direct approach, i.e., 

M 




m=l 


A. Low temperature/3 = 10^. 

We first consider a low temperature case with /? = lO'^ {ksT = 0.0001), and estimate the 
parameters in the model fl3.16p for uq. This model contains two (complex) deterministic 
parameters a and b, together with two real valued parameters c and d. We set the observation 
noise error with variance, R = 0.01. In Fig. [H we show the estimated solutions along 
with the observed values during the estimation procedure. Very good agreement with the 
true values has been found. We also monitor the predicted values of the deterministic 
parameters, and the history is presented in Fig. [2l Meanwhile, the stochastic parameter ai 
has settled to a constant value, and the variance of the observation noise has been correctly 
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FIG. 1. State estimates u and / for the parametric model (|3.16l) . 


predicted, as indicated in Fig. [3l Another observation is that the predicted values of 
the parameters c and d are c=—0.0067 and <7 = 0.0024, which are quite different from the 
values calculated from the Mori-Zwanzig’s projection procedure (0.00063 and 1, respectively) 
corresponding to an approximate Gaussian measure, We should point out that if 

we £x these two parameters to be those from the Mori-Zwanzig projection and run the 
hltering procedure to estimate the remaining parameters, a,b,ai,R, the resulting estimates 


are completely innaccurate. This suggests that while the perturbation approach 


16| suggests 


the explicit forms of the reduced model, it is more natural to adaptively estimate all the 
parameters which reconhrms the results in j^. Moreover, in general, there can be non¬ 
unique parameters that provide the same equilibrium statistics (for e.g., see Proposition 
1(d) in 


26|). 


Next we evaluate the forecasting skill and check the accuracy of the climatological statis¬ 
tics of the resulting solution uq. We observe from Fig. IHthat the qualitative behavior of the 
path-wise solutions is well captured. The solution /, which was introduced to replace the 
memory term, exhibits much faster oscillations. Further, from Fig. |5l we observe that the 
distribution and the time correlation of the true solution are accurately predicted. We also 
report the accuracy of the hrst four moment estimates in Table HI where the errors are on 
the order of 10“^. 
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FIG. 2. Deterministic parameter estimates, a, b, c and d. 
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FIG. 3. Stochastic parameter estimates, ai,R. 


B. Results for a higher temperature /3 = 10. 

We now turn to observations obtained from a higher temperature simulation with /3 = 10 
{ksT = 0.1). In this case, the time series for uq exhibits faster oscillations and slightly larger 
amplitude. We also observe the amplitude and frequency of the oscillations are somewhat 
sensitive to the initial conditions since the system is not ergodic. 
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FIG. 4. Solutions of the reduced model in (j.S.ldl) . integrated with the estimated parameters. 



FIG. 5. The marginal distribution (left) and the time correlation function (right) predicted by 
the reduced mode (j3.16l) for low temperature, /3 = 10^. As comparison, the statistics of the true 
solution is also shown. 
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FIG. 6. Predicted marginal distribution and correlation function for /3 = 10 using the first reduced 
model (j.S.lOp . 


Based on the data, we estimate the first reduced model (13.161) . and then check the statis¬ 
tics of the resulting model. Due to the higher temperature, the variance of Uq is much 
bigger. Therefore, we set a higher value for observation noise variance, i? = 0.1. We see 
from Fig. [6] that the accuracy is not as satisfactory as in the previous case. In particular, 
the peaks of the marginal density are not well captured and the variance is underestimated 
(see Table [Tl) although the other statistics are accurately estimated. Also, the correlation 
function is inaccurate beyond the hrst oscillation. 

As comparison, we consider the next parametric model, represented by the equation 
(I3.19p . which contains 4 complex deterministic parameters and two real ones. With the pa¬ 
rameters obtained from the hltering procedure, we perform a similar statistical verihcation. 
The results, including the histogram and the time correlation functions, are illustrated in 
Fig. [71 It is clear that the extension has offered improved accuracy in the resulting his¬ 
togram. Table H] summarized the statistics (four moments) of uq obtained from the three 
tests, compared to the true values. We notice that for the higher temperature case, the 
model (13.191) yields much better estimates for the second moment. However, the estimated 
correlation function is only slightly improved up to time 500 relative to the result from the 
model in fl3.16p . 
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FIG. 7. Predicted marginal distribution and correlation function for /3 = 10 using the reduced 
model (j3.19p . 


TABLE I. Comparison of the equilibrium statistics of Re(rio) for the three tests. 



Model ifvmp. d = in4 

Model IfTTop. d = 10 

Model IfTTop. d=10 

Statistics 

Truth 

Estimate 

Truth 

Estimate 

Truth 

Estimate 

mean 

-0.0037 

-0.0717 

0.0457 

0.0368 

0.0457 

-0.0642 

variance 

2.4018 

2.4570 

8.0655 

5.3181 

8.0655 

8.5148 

skewness 

0.0840 

0.0658 

-0.0251 

-0.0204 

-0.0251 

0.0371 

kurtosis 

1.5071 

1.5123 

1.4998 

1.5160 

1.4998 

1.5062 


VI. NUMERICAL RESULTS FOR MULTIPLE RETAINED FOURIER MODES 

In this section, we consider modeling three modes, k_i, Mq, and Mi, in the Fourier series 
for much higher temperature case with /5 = l/20 {ksT = 20). In this numerical experiment, 
the parametric model in fl3.23p has 6 dimensional complex valued variables and 18 real 
valued parameters. For this case, we found several numerical issues when including more 
Fourier models in the reduced models. First, Fourier modes m_i and Ui exhibit very different 
frequency compared with that of Mq, which can be seen in Fig. [HI As a consequence, the 
variance of each component has different scales (see Table HD. The disparity in covariance 
scaling becomes exceedingly large when the temperature is low. As a result, much smaller 
time steps are needed in the estimation procedure to sample the observations for m_i and 
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Ml. On the other hand, the procedure has to be continued for a long time period to make 
sufficient observations of mq- A more flexible estimation method would be more useful in 
this case. 

A second related issue is that these three multiscale Fourier modes are correlated and this 
suggests that one may need a different ansatz for the parametric models. For example, one 
may need to consider fully correlated noises in the equations for in fl3.23p . which means 
more parameters to £t. An alternative way to overcome this issue is to fit the model in 
fl3.23p to the uncorrelated observations that can be obtained by rescaling the observations 
with the covariance matrix. In particular, we define our observations as follows, 

Vj = Uj + ej, (6.34) 

where Uj = is the rescaled of Uj by the equilibrium covariance matrix C that can 

be computed empirically through time averaging of a long time series, assuming the sta- 
tionarity and ergodicity of the underlying dynamics. Note that this rescaling improves the 
identihability of uq that has much larger variance relative to (again, see Table HI]) 

since the equilibrium covariance of the rescaled variables Uj is an identity covariance matrix, 
X. In our numerical experiment below, we assume that the observation error covariance to 
be 10% of identity, R = 0.1X and we fit the rescaled observations in fl6.34l) to (13.231) . where 
the observation time interval is chosen to be Tabs = 0.02 and the training data set is 100000 
data points. 

To conhrm the success of the hltering procedure, we see that the filter estimate for R 
converges to the true value, i? = 0.1X, and the hlter estimates for Uj have an equilibrium 
covariance that is indeed identity, exactly equals to the equilibrium covariance of Uj. The 
last point here, however, does not imply that the solutions of the reduced model in (I3.23p . 
integrated with the estimated parameters, will have an identity equilibrium covariance ma¬ 
trix. To verify the predictive skill of the resulting parametric model in fl3.23p . we rescale 
the solutions, Uj, to the appropriate scaling of the underlying dynamics with the following 
covariance transformation, = where C is the equilibrium covariance of iij. 

In our numerical experiment, we compute this statistics, C, by averaging the solutions, Uj, 
of (I3.23P at 6000-10000 model time units, at discrete time step of Tots = 0.02. In particular, 
the solutions Uj are obtained by integrating the model in fl3.23p with parameters determined 
by averaging over the last 1000 steps of the hlter estimates. 
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In Fig. IHl an example of the solntions from the parametric model fl3.23p is shown; here we 
compare the estimates Uj (black solid line) with the truth uj (red dashes) at an arbitratry 
period of time interval. Notice that even if we don’t expect a path-wise agreement, the 
qualitative behavior of the solutions are reasonably reproduced (in the sense that their 
magnitude and frequency are qualitatively comparable). In Fig. [H] the comparison of the 
histogram and the time correlation functions to those of the full model is demonstrated. 
Notice that despite the difference in amplitude and temporal scalings between modes uq 
and the nontrivial marginal distributions are well captured. The correlation times 

for mode Uq are well captured at least until 10 unit time; for the other modes, {n_i,ni}, 
the correlation times are in agreement for about one period of oscillation (approximately 
up to one unit time). We also report the hrst four moments estimates compared to those 
of the truth for each variables in Table m The exact agreement in terms of variances are 
not surprising since we purposely scale the estimates to match the covariance of the true 
dynamics. However, the agreement in terms of the higher order moments such as skewness 
and kurtosis is nontrivial. 

TABLE II. Equilibrium statistics predicted by the model in (|3.23p compared to those of the full 
model. 



Rett_i 

Reuo 

Retti 

statistics 

truth 

estimate 

truth 

estimate 

truth 

estimate 

mean 

0.0008 

0.0005 

-0.0036 

0.0164 

0.0019 

-0.0004 

variance 

10.5646 

10.5646 

487.4128 

487.4128 

9.3998 

9.3998 

skewness 

-0.0002 

-0.0003 

0.0004 

-0.0009 

-0.0007 

-0.0002 

kurtosis 

1.6594 

1.8106 

1.5000 

1.5000 

1.6999 

1.8017 


VII. SUMMARY AND DISCUSSION 

This paper presented a modeling approach that blends some physical knowledge about the 
underlying dynamics and the availability of training data to predict low-frequency modes 
of the NLS equation. In particular, we use the Mori-Zwanzig formalism as guidelines to 
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FIG. 8. Solutions of the reduced model in (I3.23h . compared to those of the full model. 
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FIG. 9. Predicted marginal distributions and correlation functions for /3 = l/20 from the reduced 
model in (j3.23p 
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construct effective parametric models and apply an adaptive ensemble Kalman filter to 
estimate the parameters. The novelty here is that we approximate the memory term and 
the orthogonalized dynamics of a generalized Langevin equation obtained from the Mori- 
Zwanzig expansion with a rational function and a colored noise, respectively. It turns out 
that the resulting parametric model here is an example of the physics constrained nonlinear 
regression modeling approaches proposed in [2^, [2^. This serendipity allows one to use 
the stability conditions established in to ensure non-blow up solutions of the resulting 
parametric model. Compared to the full GLE, these models have advantages in practical 
implementations because they do not involve memory. 


The climatological forecasting skill of the proposed parametric model was verihed in 
terms of the hrst four moments, marginal densities, and correlation functions for various 
temperatures. For low temperature case, high predictive skill of Fourier mode uq is obtained 
with a reduced model with a scalar parameterization for the memory term fl3.16p . For higher 
temperature case where the scale-gap is smaller than the low temperature case, the problem 
becomes more challenging. In this situation, we showed that one can improve the estimates 
either with a two-dimensional parameterization for the memory term in (I3.19p or with htting 
more modes into a model with more retained modes in (I3.23p . 


With the encouraging results in this paper, we plan to apply this modeling strategy on 


other applications such as on coarse-grained biomolecular models 


11 


14| in our future 


research. In general problems, however, the success of this modeling approach will depend 
mostly on the choice of the ansatz for modeling the memory terms. As it has been theo¬ 
retically established in j^, if the ansatz is adequate, then it is possible to obtain, both, 
accurate climatological statistical forecasting and optimal hltering. Our NFS example in 
this paper empirically suggested that our ansatz is optimal in this case. Other potential 
issue is in the parameter estimation strategy which can be expensive when more observa¬ 
tions are included. While many cheaper parameterization methods are available (such as 
regression-based or maximum likelihood-based algorithms), these methods are often inferior 
to the adaptive method applied in the present work even when adequate ansatz is used 
as shown in |^. Therefore, improving the numerical efficiency of the adaptive parameter 


estimation scheme that we used here 


26l | or its variant (see e.g., [3^, [3^) will be the key for 


successful applications in more complex problems. 
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Appendix A: Remarks on the second memory terms in (12.lip 

Mathematically, one can also include the second memory term using the similar rational 
approximation for the high temperature case when this term is not negligible. Denoting the 
other kernel function as, 



( 1 . 1 ) 


where (3 is an additional parameter and approximate the Laplace transform of the kernel 
function, 



( 1 . 2 ) 
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where we assume that a and /3 are real valued parameters. The function g{t) follows the 
differential equation, 

g = ag-i/3\uo\^uo. (1.3) 

Adding white noises into fll.3p . we obtain a parametric model given by, 

Mo = - ic-uo -id\uo \^uq + bf + (3g 
< f=af-b*Uo + a,Wf (1-4) 

g=ag-if3\uo\^Uo + a2Wg, 

where we have added an equation of g to represent the second memory term in fl 2 . 1 ip . The 
problem here is that the nonlinear terms do not conserve energy since we can not control 
the nonlinear terms in the equation for g unless for /9 7 ^ 0. We suspect that there probably 
exists different approximations (other than the rational functions) for these kernel functions 
that give stable parametric models and these are beyond the scope of this paper. Based on 
this consideration, we do not implement the parametric model in (ll.4p in this paper. 


Appendix B: Pseudo-algorithm for parameter estimation 


This Appendix provides a pseudo-algorithm of the estimation method proposed in 
Consider the following hltering problem. 



Xj = fixj-i) + Tek, efc~A^(0,(5), (2.1) 

Vj = Hxj + e°, e°~A/’(0,i?), 

where Xj = {xj,9dj) denote the augmented state and deterministic parameters. Here, we 
assume a persistence model for the deterministic parameters, 0dg = 0dg-i- We attempt to 
estimate Xj as well as Q and i?, on-the-fly. Essentially, Q and R are the stochastic parameters 
through the following relation, 

p p 

R=J2R^es,i. 

i=l i=\ 

and our aim is to estimate i = l,...,p. For the model in fl3.17p . the augmented state- 
parameters are x= (Re{Mo} 5 liii{'Wo}!R-e{/},Im{/},ai,a 2 ,&i,& 2 ,C 5^^)^5 the number of stochas- 
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tic parameters are p = 2, where 9s,i = erf, 9s ,2 = -R, and 


foo^ 0 0 ... o\ 

r= , Qi=l2, Q2 = Ri = 0, R2 = 1 . (2.2) 

vooo^o...o; 

Starting with time index j = 1, we provide an ensemble of prior statistical estimates, 
of size K for the primary hlter and prior mean {9s,i,j}^^i and covariance Qj =Tp, 
for the secondary hlter. The primary hlter for estimating Xj is described in Steps 1-3, while 
the secondary hlter for estimating 9s is described in Steps 4-9. 


1. Apply the ETKF to obtain the analysis ensemble estimate, Let’s denote 

the corresponding Kalman gain and innovation as follows. 




2=1 


where x, =K denotes the prior ensemble average. See for the detail 


ETKF algorithm. 


2. Propagate each ensemble member with the deterministic part of the model in fl2.ip to 
obtain, 

x)i, = f{ke^), k = l,...,K, 

and form the posterior ensemble by adding a Gaussian noise, 

2=1 

3. Dehne an ensemble approximation for the linear tangent model. 


A,=V!(x*)^Uf^,W], (2.3) 

where each column vectors of and Wj are the deterministic forecast ensemble 
perturbations and the analysis ensemble perturbations, consecutively. In fl2.3l) . we 
denote pseudo-inverse by f. 


4. Dehne Kj = AjKj and (j)j = Aj — KjH. 






5. For each i = 1,... ,p, construct an observation operator for starting with = 0, 
let 


= H Mijfi + Ri , 

^*,,+1,0 = -007 +rg.r^+ k^r^kJ . 

6 . For each i = construct an observation operator for where k>l. Set 

^i,j,£ ~ HMiJ/ 

7. Approximate ^{vjvj) = J2^=i^i,j,o^s,i,j- Suppose if ej = {e^,...,e^)~^ is m-dimensional. 
Dehne 






.2^1 


A 


1 7n 2 77X 

~ • • ’S' ■ • ■ ’SS— 


mm \T 

g g-rJ 


8 . Consider the pseudo observation model for the secondary hlter, 


(^j,e = J^j,£Ss + r]j,i, hi/~-^(0,^0/), £=!,...,L, (2.4) 

where in our case, crj-£ = nec(ejeJ_^) eM+, Rj^i= .. ,Fpj^i), 0^ = {9^,1,... ,9^^^ 

and for each pair of indices {k,i}, construct 

fFjy = lE(ejeJ )E(ej_£eJ_^) + E(ejeJ 


Note that W is constructed, assuming Gaussian and independent noises, rjj^i. Compo¬ 
nents of matrix W in fl2.5p can be rewritten as follows. 




= E(£“£j)E(/_,£P,) + E(£“£5)E(/£j) Vo. 


9. Perform a secondary Kalman hlter L-times to sequentially update 6 ^ 5,1 j+i with observa¬ 
tion models in fl2.4p one at the time, assuming that the dynamics of these parameters 
are persistence, 9s^i = 0. Now we can repeat Step 1 above for the new assimilation 
time. 
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